An Improved Evolutionary Algorithm for Extractive Text Summarization
نویسندگان
چکیده
The main challenge of extractive-base text summarization is in selecting the top representative sentences from the input document. Several techniques were proposed to enhance the process of selection such as feature-base, cluster-base, and graph-base methods. Basically, this paper proposed to enhance a previous work, and provides some limitations in the similarity calculation of that previous work. This paper proposes an enhanced mixed feature-base and cluster-base approaches to produce a high qualified single-document summary. We used the Jaccard similarity measure to adjust the sentence clustering process instead of using the Normalized Google Distance (NGD) similarity measure. In addition, this paper proposes a new real-to-integer values modulator instead of using the genetic mutation operator which was adopted in the previous work. The Differential Evolution (DE) algorithm is used for train and test the proposed methods. The DUC2002 dataset was preprocessed and used as a test bed. The results show that our proposed differential mutant presented a satisfied performance while the Genetic mutant proved to be the better. In addition, our analysis of NGD similarity scores showed that NGD was an inappropriate selection in the previous study as it performs successfully in a very big database such as Google. Our selection of Jaccard measure was fortunate and obtained superior results surpassed the NGD using the new proposed modulator and the genetic operator. In addition, both algorithms outperformed the standard baseline Microsoft Word Summarizer and Copernic methods.
منابع مشابه
Text Summarization Using Cuckoo Search Optimization Algorithm
Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...
متن کاملBiogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization
Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...
متن کاملEvolutionary Algorithm for Extractive Text Summarization
Text summarization is the process of automatically creating a compressed version of a given document preserving its information content. There are two types of summarization: extractive and abstractive. Extractive summarization methods simplify the problem of summarization into the problem of selecting a representative subset of the sentences in the original documents. Abstractive summarization...
متن کاملSemi-supervised extractive speech summarization via co-training algorithm
Supervised methods for extractive speech summarization require a large training set. Summary annotation is often expensive and time consuming. In this paper, we exploit semi-supervised approaches to leverage unlabeled data. In particular, we investigate co-training for the task of extractive meeting summarization. Compared with text summarization, speech summarization task has its unique charac...
متن کاملCombining Optimal Clustering And Hidden Markov Models For Extractive Summarization
We propose Hidden Markov models with unsupervised training for extractive summarization. Extractive summarization selects salient sentences from documents to be included in a summary. Unsupervised clustering combined with heuristics is a popular approach because no annotated data is required. However, conventional clustering methods such as K-means do not take text cohesion into consideration. ...
متن کاملExtractive Based Automatic Text Summarization
Automatic text summarization is the process of reducing the text content and retaining the important points of the document. Generally, there are two approaches for automatic text summarization: Extractive and Abstractive. The process of extractive based text summarization can be divided into two phases: pre-processing and processing. In this paper, we discuss some of the extractive based text ...
متن کامل